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Abstract —In multi-terminal networks, feedback increases the 
capacity region and helps communication devices to coordinate. 
In this article, we deepen the relationship between coordination 
and feedback by considering a point-to-point scenario with an 
information source and a noisy channel. Empirical coordination 
is achievable if the encoder and the decoder can implement 
sequences of symbols that are jointly typical for a target 
probability distribution. We investigate the impact of feedback 
when the encoder has strictly causal or causal observation of 
the source symbols. For both cases, we characterize the optimal 
information constraints and we show that feedback improves 
coordination possibilities. Surprisingly, feedback also reduces 
the number of auxiliary random variables and simplifies the 
information constraints. For empirical coordination with strictly 
causal encoding and feedback, the information constraint does 
not involve auxiliary random variable anymore. 

Index Terms —Shannon Theory, Feedback, Empirical Coordi¬ 
nation, Joint Source-Channel Coding, Empirical Distribution of 
Symbols, Strictly Causal and Causal Encoding. 

I. Introduction 

Feedback does not increase the capacity of a memoryless 
channel 0. However, it has a significant impact when consid¬ 
ering problems of empirical coordination. In this framework, 
encoder and decoder are considered as autonomous agents 
0 , that implement a coding scheme in order to coordinate 
their sequences of actions, i.e. channel inputs and decoder 
outputs, with a sequence of source symbols. The problem 
of empirical coordination 0, fin. 0 consists in determining 
the set of joint probability distributions, that are achievable 
for empirical frequencies of symbols. Empirical coordination 
provides a single-letter solution that simplifies the analysis 
of optimization problems such as minimal source distortion, 
minimal channel cost or maximal utility function of a decen¬ 
tralized communication network 0. For example, the optimal 
distortion level is the minimum of the expected distortion 
function, taken over the set of achievable joint probability 
distributions. 

In the framework of multi-terminal networks, feedback 
increases the capacity region of the multiple-access channel 
0, 0 and of the broadcast channel 0, IfTUl . In the literature 
of game theory, feedback is considered from a strategic point- 
of-view. In |21, a player observes the past actions of another 
player through a monitoring structure involving perfect or 


imperfect feedback. In nn, the authors investigate a four- 
player coordination game with imperfect feedback and provide 
a subset of achievable joint probability distributions. Empirical 
coordination is a first step toward a better understanding of 
decentralized communication network. The set of achievable 
joint distributions was characterized for strictly causal and 
causal decoding in (6j, with two-sided state information in fl2l 
and with feedback from the source in CGQ. From a practical 
perspective, coordination with polar codes was considered in 
M . Lossless decoding with correlated information source and 
channel states is solved in ED- Empirical coordination for 
multi-terminal source coding is treated in m and in Hz). 



Fig. 1. Strictly causal encoding function with feedback fi-U 1 ] X y i 1 — > 
X, for all i G {1,..., n} and non-causal decoding function g : y n —>• V” . 


In this article, we consider the point-to-point scenario of 
fl8l with channel feedback, as represented by Fig. |T| and [2] 
The encoder has perfect feedback from the channel and strictly 
causal or causal observation of the symbols of source. In both 
cases, we characterize the set of achievable joint probability 
distributions over the symbols of source and channel. We 
show that the information constraints are larger than the ones 
stated in (18). Surprisingly, feedback also reduces the number 
of auxiliary random variables and simplifies the information 
constraints. For empirical coordination with strictly causal 
encoding and feedback, the information constraint does not 
involve auxiliary random variable anymore. There is an anal¬ 
ogy with strictly causal decoding (6), Hi 31 . since no auxiliary 
random variable is needed when the decoder has feedback 
from the source. Feedback allows to remove auxiliary random 
variables of information constraints, for empirical coordination 
problems. 

System model and definitions are stated in Sec. |TT] and 
characterizations of achievable joint distributions are stated in 
Sec. [TTT] Comparison with previous works and an example are 
stated in Sec. [IV] and [V] Conclusions and sketches of proofs 
are stated in Sec. [VI] and in Appendix [Al |B1 ICl 









II. System model 

Figure Q] represents the problem under investigation. Ran¬ 
dom variable U is denoted by capital letter, lowercase letter 
u G U designates the realization and U" corresponds to 
the n-time cartesian product. U n , X n , Y n , V n stands for 
sequences of random variables of source symbols u n = 
(iti, ... ,u n ) G U n , inputs of the channel x n G X n , outputs 
of the channel y n G y n and decoder’s output v n G V n . 
The sets U, X , y, V are discrete. The set of probability 
distributions V{X) over X is denoted by A(X). Notation 
|| Q - 7^||tv = 1/2 • J2xex\Q( x ) ~ “P{ x )\ stands for the 
total variation distance between probability distributions Q 
and V. Notation Y —e— A' —e— U stands for the Markov 
chain property corresponding to V{y\x,u) = V(y\x) for all 
( u,x,y ). Information source is i.i.d. distributed with V u and 
the channel is memoryless with transition probability T y \ x . 
Encoder C and decoder V know the statistics V u and T y \ x of 
the source and channel. The coding process is deterministic. 


Definition II.l A code c G C(n) with strictly-causal encoder 
and feedback is a tuple of functions c = ({/i}" = i,ff) defined 
by equations Q and ©: 

fi ■ U^ 1 x y^ 1 — > X, i = 1,... ,n, (1) 

g :y n ^ V". (2) 

The number of occurrence of symbol u £lA in sequence u n is 
denoted by N(u\u n ). The empirical distribution Q n G A (JA x 
X x y x V) of sequences (u n , x n , y n , v n ) is defined by: 

N(u,x,y,v\u n ,x n ,y n ,v n ) 

Q (u,x,y,v) = -, 

n 

M{u,x,y,v) G U x X x y xV. (3) 


Fix a target probability distribution Q G A(U x X x y x V), 
the error probability of the code c G C(n) is defined by: 


T e (c) = V c 



(4) 


where Q n G A(ll x X xy xV) is the random variable of the 
empirical distribution induced by the probability distributions 
V u , Ty\ x and the code c G C(n). 


Definition II.2 The probability distribution Q G A{U x X x 
y x V) is achievable if for all £ > 0, there exists a h £ N s.t. 
for all n > n, there exists a code c G C(n ) that satisfies: 


V e (c) = V c 




< £. 


(5) 


The error probability V e (c) is small if the total vari¬ 
ation distance between the empirical frequency of sym¬ 
bols Q n (u,x,y,v ) and the target probability distribution 
Q(u,x,y,v) is small, with large probability. In that case, 
the sequences of symbols (U n ,X n ,Y n ,V n ) G A* n {Q) are 
jointly typical, i.e. coordinated, for the target probability 
distribution Q with large probability. 

As mentioned in and 02], the performance of the 
coordination can be evaluated using an objective function 


:U x X x ^ x V G R. We denote by A*, the set of joint 
probability distributions Q G A* that are achievable. Based 
on the expectation Eq 6 ^4* $(17, X, Y, V) , it is possible to 
derive the minimal channel cost &(u,x,y,v) = c(x), the 
minimal distortion level $(m, x, y , v) = d(u, v ) or the maximal 
utility of a decentralized network 0, using a single-letter 
characterization. 


III. Characterization of achievable distributions 

This section presents the two main results of this article. 
Theorem lHI. 11 characterizes of the set of achievable joint prob¬ 
ability distributions for strictly causal encoding with feedback, 
represented in Fig. Q] 


Theorem III.l (Strictly causal encoding with feedback) 

1) If the joint probability distribution Q(u,x,y,v) is 
achievable, then it decomposes as follows: 

(q(u)=V u {u), Q(y\x) = T(y\x), 

| U independent of X , Y -e- A' -e- U. 

2) Joint probability distribution V u {u ) <g> Q(x) ® T(y\x) <g> 
Q(v\u,x,y) is achievable if: 

I(X;Y) - I(U]V\X,Y) > 0, (7) 

3) Joint probability distribution V u {u) <8> Q{x) (g> T{y\x) (J) 
Q(v\u,x,y ) is not achievable if: 

I(X;Y) — I(U ; V\X, Y) < 0, (8) 

Sketch of proof of Theorem IIII. II is stated in Appendix [A] 
Equation CD) comes from Theorem 3 in m by replacing 
the auxiliary random variable by decoder’s output V and the 
observation of the encoder by the pair of information source 
and channel feedback ( U,Y ). 

A causal encoding function is defined by /,; : U' x y l ~ x —> 
X , Mi G {1,..., n}. Theorem IIII. 21 characterizes of the set of 
achievable joint probability distributions for causal encoding 
with feedback, represented in Fig. [2] 



Fig. 2. Causal encoding function with feedback : it' X y 1 1 — » X, for 
all i £ (1,..., n} and non-causal decoding function g : y n —X V n . 


Theorem III.2 (Causal Encoding with Feedback) 

1) If the joint probability distribution Q(u,x,y,v) is 
achievable, then it decomposes as follows: 

Q(u)=Vu(u), Q(y\x) = T(y\x), Y -e- X -e- U, (9) 

2) Joint probability distribution V u (u) <g> Q(x[u) ® V(y\x) <g> 
Q(v\u,x,y) is achievable if: 

I(W;Y) - I(U-,V\W,Y)) >0, (10) 


max 

QeQ 


























3) Joint probability distribution V u {u) g Q{x\u ) <g) T(y\x) g 
Q(v\u,x,y) is not achievable if: 

I(W-,Y) - I(U-,V\W,Y)j <0, (11) 

where Q is the set of probability distributions Q £ A(U x W x 
X x y x V) with auxiliary random variable W that satisfies: 

E™ew 2 (m, w, x, y, v) 

= V u {u) ® Q(a:|w) 0 T(y|a;) ® Q(v|u,x,i/), 

U independent of W, 

Y-e-X-e- ( U , IT), 

V -e- (U,Y,W) -e- X. 

The probability distribution Q £ Q decomposes as follows: 

V u (u) g Q(u>) g Q(a;|u,w) gT(y|at) g> Q(u|u, t/,w). 

77;e support ofW is bounded by |W| < |W x A’ x y x V| + 2. 

Sketch of proofs of Theorem 1111.21 are stated in Appendix [B] 
and O The random variable V is directly correlated with the 
pair ( U,Y ) of source and channel output. Feedback implies 
that V is extracted from the Markov chain Y -e- X -e- (U, W) 
of the memoryless channel. 

IV. FEEDBACK IMPROVES EMPIRICAL COORDINATION 
In this section, we investigate the impact of the feedback on 
the set of achievable joint distributions stated in Theorems lIII.il 
and lIII.2l Considering strictly causal encoding, we evaluate the 
difference between information constraint stated in equation 
([7} and the one stated in Theorem 3 in 1081 without feedback. 


max 

CeQ 


I(X-Y)-I(U-,V\X,Y) (12) 

- max (l{X-Y) -/((/; W 2 |AT)) (13) 

eeQ se V / 

= min I{U; W 2 \X) - I(U; V\X, Y) (14) 

QeQ se 

= H(U\V, X , Y) - max H{U\X, W 2 ) > 0. (15) 

Q^Qse 

Q se is the set of probability distributions Q £ A(U x W 2 x 
X x y x V) with auxiliary random variable W 2 that satisfies: 


V u {u) g> Q(x) g Q(w 2 \u, x) g) T{y\x) g Q(v\y, x, w 2 ). 

• Equation (0~5t is equal to zero if (( 7 , V) is independent of 
(X,Y), this corresponds to the lossy transmission without 
coordination in which the feedback does not increase the 
channel capacity m. 

• Equation (051 is equal to zero when the decoder output 
V is empirically coordinated with (U,X) and not with the 
channel output Y, because in that case W 2 = V. Since the 
auxiliary random variable W 2 should satisfy Q(v\y, x, u) = 

Q(.W 2 \u,x) ■ Q(v\y, x, w 2 ), equation O provides 
an upper bound to equation ( IT3l > that is easier to evaluate 
There is a strong analogy between strictly causal encod¬ 
ing with channel feedback and strictly causal decoding with 
source feedback. Equation ( IT6b corresponds to strictly causal 
decoding without feedback from the source, stated in ||6l . 

max (iiW^YlV) - I{U\V,W{) ) >0. (16) 

QeQsd \ J 



Fig. 3. Non-causal encoding / : U n —> X n and causal decoding Qi : 
y l x U 1-1 —>• V for all i G {1,..., n} with feedback from the source. 

Q s d is the set of probability distributions Q £ A(U x Wi x 
X x y x V) with auxiliary random variable VI'), that satisfy: 

V u (u) g Q(x,v\u) g Q{w\\u,x,v) ®T{y\x). 

Equation O corresponds to strictly causal decoding with 
feedback from the source, characterized in G3- 


I(X; Y\U, V) — I(U ; V) > 0. (17) 

Equation ( fTTb can be deduced from equation ( 1 1 6k by replacing 
the auxiliary random variable W\ by X and the observation 
of the decoder Y by the pair ( U , Y). 

This analysis extends to causal decoding with feedback from 
the source, represented by Fig. [3] and characterized by dT8l> . 

max (I{X-Y\U,W 3 ) - I{U;W 3 )) > 0. (18) 

CeQdf \ ) 

Qdf is the set of probability distributions Q £ A(U x W 3 x 
X x y x V) with auxiliary random variable W 3 , that satisfy: 


V u (u) g Q(x,w 3 \u) ®T{y\x) g Q{v\y,w 3 ). 


The proof is in lfl9l . Theorems IIII.ll and IIII.2I also extend to 
two-sided state information by replacing (U, S) by (U, S , Y) 
in the results of m, for strictly causal and causal encoding. 
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Fig. 4. Binary information source and binary symmetric channel with 
parameters p = 1/2 and e £ [0, 0.5] 


V. Example: binary source and channel 


We consider a binary information source and a binary sym¬ 
metric channel represented by Fig. 0] The set of symbols are 


U.X.Y 

( 0 , 0 , 0 ) 

( 1 , 0 , 0 ) 

( 0 , 1 , 0 ) 

( 1 , 1 , 0 ) 

( 0 , 0 , 1 ) 

( 1 , 0 , 1 ) 

( 0 , 1 , 1 ) 

( 1 , 1 , 1 ) 
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Fig. 5. Conditional probability distribution Q v | uxy depending on parameter 
a € [0, 7/8] where Q(V = l|(£7, X, Y) = (0, 0, 0)) = 1 - a and Q(V = 
2| ( U , X , Y) = (0, 0, 0)) = a/7. For a = 7/8, the probability distribution is 
uniform over the set V = {1,... , 8} and independent of the triple ( U,X,Y ). 
For a = 0, the output V corresponds exactly to the triple (U,X, V). 


given by U = X = y = {0,1} and V = {1,2,3,4,5,6,7,8}. 
We assume the parameter p E [0,1] of the information source 
































is equal to 1/2. The probability distribution of channel input 
is uniform Q(X = 0) = Q(X = 1) = 1/2. The transition 
probability of the channel depends on a noise parameter 
£ £ [0,0.5]. Since the input distribution is uniform and the 
channel is symmetric, the output probability distribution is 
also uniform Q(Y = 0) = Q(Y = 1) = 1/2. We investigate a 
class of achievable conditional probability distributions Q v uxy 
described by Fig. [5] 

We consider strictly causal encodi ng wi th feedback. The 
information constraint 0 of Theorem IIII.ll writes: 

I (A'; Y) — I(U;V\X,Y) 

= H(Y) - H(Y\X) - H(V\X,Y) + H(V\U,X,Y ) 

= 1- H b (e) - H b y-y-J ~ 1 - ~y- ■ l°g 2 3 + H b (a) + a ■ log 2 7 

= H b (a) - H b (s) - H b (^^j + a ■ (^ log 2 7 - ® • log 2 3^ . 



Fig. 6. Comparison between the information constraint for empirical 
coordination with feedback I(X\Y) — I(U;V\X, Y) and the information 
constraint I(X\ Y) — I(U\V) for lossy transmission. 

In Fig. [6] we compare the information constraint for empir¬ 
ical coordination with feedback 0 and information constraint 
for lossy transmission without coordination ( fl9l ). where a is 
the distortion parameter of conditional distribution Q v Ll : 

I(X;Y) — I(U ; V) = 1 - H b (e) - 1 + H b (a) (19) 
= H b (a ) - H b {e). (20) 

The minimal coordination parameter a* ~ 0.281 > 0.1 is 
much larger for empirical coordination than for lossy com¬ 
pression. This restriction comes from the additional correlation 
requirement between the decoder output V and the random 
variables (X, Y) of the channel. Fig. [7] provides the minimal 
value of parameter a* £ [0,0.875] for empirical coordination, 
depending on the level of noise of the channel e £ [0, 0.5]. 

VI. Conclusion 

We investigate the relationship between coordination and 
feedback by considering a point-to-point scenario with strictly 
causal and causal encoder. For both cases, we characterize the 
optimal solutions and we show that feedback simplifies the 
information constraints by reducing the number of auxiliary 
random variables. For empirical coordination with strictly 



Fig. 7. Minimal value of parameter a* S [0, 0.875] for the information 
constraint /(X;Y) — I(U;V\X, Y) > 0 to be positive, depending on the 
noise of the channel e € [0,0.5]. It corresponds to the higher level of 
coordination between the random variable V and the triple (U , X, Y). 

causal encoding and feedback, the information constraint does 
not involve auxiliary random variable anymore. 

Appendix 

The full versions of the proofs are stated in ED. 

A. Sketch of proof of Theorem I III. 1 1 

Achievability proof can be obtained from the proof of 
Theorem lIII.2l stated in Appendix [B] by replacing the auxiliary 
random variable W by X. 

For the converse proof, we consider code c(n) £ C with 
small error probability V e {c). 

0 = I(U n ;Y n ) - I(U n ;Y n ,V n ) (21) 

= /([/"; Y n ) - 'jxi(Ui-,Y n ,V n ,U i - 1 ) (22) 

i = l 

< J2^I(Y i -,U n ,X i \Y i ~ 1 )-I(Ui-,Y n ,V n ,U i ~ 1 ,X i )^ (23) 

< X/ - H(Yi\Xi) - I(Ui\ Yi,Vi, (24) 

< 'f2 ^H(Y i ) + H(Ui\Yi,Vi,Xi)^ -n(^H(Y\X) + H{U)J (25) 

< n(l(X\Y) ~I(U\V\X, Y)V (26) 

Equation d2TI> comes from the non-causal decoding that in¬ 
duces the Markov chain: U n -e- Y n -e- V n . 

Equation (l22l > comes from the i.i.d. properties of the informa¬ 
tion source U that implies: / (Up, U 1 ^ 1 ) = 0. 

Equation (l23l > comes from the channel feedback and the 
strictly causal encoding function: Xi = fi(U l ~ 1 ,Y' l ~ 1 ). 
Equations (Ell and (|25) are due to the properties of i.i.d. 
information source and of memoryless channel. 

Equation (l26l > comes from the concavity of the entropy func¬ 
tion and from the hypothesis of small error probability V e {c). 

B. Sketch of achievability proof of Theorem \III.2\ 

Consider Q £ Q that achieves the maximum in equation 

©. There exists a S > 0 and a rate R > 0 such that: 

R > I(U,Y-V\W) + 5, (27) 

R < I(W;Y) + I(V;Y\W)-5mI(W,V-,Y)-6. (28) 





















We define a block-Markov random code c £ C(n) over B £ N 
blocks of length n £ N. 

• Random codebook. We generate \A4\ = 2 nR sequences 
IV” (to) drawn from Q® n with index m £ A4. For each 
index m £ M , we generate the same number \A4\ = 2” R 
of sequences V n (m, m) with index to £ A4, drawn from 

depending on W n (m). 

• Encoding function. It recalls mb -1 and finds nib £ -M s.t. 
sequences (U^_ 1 ,Y^_ 1 ,W n (m b - 1 ),V n (m b -i,m b )) £ 
A* n (Q) are jointly typical in block b — 1. It deduces 
W n (mb ) for block b and sends Xf drawn from Q®” w 
depending on ( Uf ,W n (mb )). 

• Decoding function. It recalls mb -1 and finds mb £ 
M s.t. sequences (Yf l ,W n (mb)) £ A* n (Q) and 
{Yff i, W n (mb-i), V n (mb-i,m b )) £ A* n (Q) are 
jointly typical. It returns V n (nib-i, mb) over block b— 1. 

• First block at the encoder. An arbitrary index toi £ 
A4 of W n (mi) £ W™ is given to encoder and 
decoder. Encoder sends XT drawn from Q'f n de- 
pending on (Uf i ,W n (mi)). At the beginning of the 
second block 62 , encoder finds index to 2 such that 
(U£ i ,Y b n i ,W n (m 1 ),V n {m u m 2 )) £ A* n (Q). It sends 
Xf 2 drawn from Q^” u depending on (Uf 2 ,W n (m 2 )). 

• First block at the decoder. At the end of 

second block 62 , the decoder finds the index 
to 2 such that (Yff W n (m 2 )) £ A* n (Q) and 

(Y£, W n (mi), V n (mi, m 2 )) £ A* n (Q). Over the first 
bloc, decoder V returns V n (mi,m 2 ) £ V". Sequences 
(U£ il W n im 1 ) 1 X£ i ,Y^V n (m ll m 2 )) £ A* n (Q) are 
jointly typical over the first block b±. 

• Last bloc. Sequences are not jointly typical. 

Equations (l27l >. (l28l l imply for all n > n, for a large number 
of blocks B £ N, the sequences are jointly typical with large 
probability. 

E c [-p(t/ n $ A*”(Q))] < s, 

«c[p(v»£*(, (Uj > _ 1 ,V' b n _ 1 ,W' n (m i) _ 1 ),V n (m b _ 1 ,m)) ^ Aj n (Q))] < s, 
E c [-P^3m' * m, s.t. { (Y b n , W n (m'» £ A*”(C)} n 

{('r b n - 1 .W n (m b _ 1 ),V n (m b _ 1 . m ')) £ A*"(Q)})] < e. 

C. Sketch of Converse Proof of Theorem \III.2\ 

Consider code c(n) £ C with small error probability V e (c). 


0 < J ( f/i 1 ' Y% Vi+i; Y i) ~ J2 I ( Y i+l! U i> Y i\ ui 1 ) < 29 ) 

i=l i=1 

= I (U i ~ 1 ,Y t ~ 1 -,Y i ) - ^2 I (Y^_i-,U i \U i ~ 1 ,Y Z ~ 1 ,Yi) (30) 

i = 1 i= 1 

= Yl UP*" 1 , y i_1 ; J ( Y i+l> v i> UilU 1 - 1 , Y*- 1 , (31) 

i=l i=1 

< Yl /(ir 1-1 , y t_1 ; Yi) - J2 I(Vii U i \U z ~ 1 , Y* -1 , Y t ) (32) 

i=l i= 1 


= YI nwiiYi) - J 2 nvaUiiWi^i). 

i =1 i= 1 

< n ■ max Y) - I(V\U\W, . 


( 33 ) 

( 34 ) 


Eq. (l29i (l30l ) are due to Csiszar Sum Identity, prop, of MI. 
Eq. (ED is due to the non-causal decoding function V n = 
g(Y n ), that implies: /(Vfc U^U 1 - 1 , Y*- 1 , Y u Yfff) = 0. 

Eq. ED is due to the properties of the mutual information. 


Eq. El is due to the introduction of auxiliary random 
variables Wi = (U l ~ l ,Y l ~ l ) satisfying properties of set Q. 
Eq. El comes from taking the maximum over the set Q. 


Ui is independent of H j, 

(35) 


(36) 

Vi{Ui,Yi,Wi)-e-Xi. 

(37) 


• Eq. El is d ue to the i.i.d. property of the source that 
implies [7* is independent of C/ z_1 . The causal encoding with 
feedback Xi = ffU 1 , Y 1 ^ 1 ) and the memoryless property of 
the channel implies that Y 2-1 is independent of U,. 

• Eq. El comes from the memoryless property of the 
channel and the fact that Yj is not included in W % . 

• Eq. El comes from the causal encoding with feedback 
function that implies that Xi is a deterministic function of 
(Ui, F I_1 ) which is included in (Ui,Yi,Wi). 
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